Post-Quantization Gain Correction in Audio Coding

ABSTRACT

A gain adjustment apparatus for use in decoding of audio that has been encoded with separate gain and shape representations includes an accuracy meter configured to estimate an accuracy measure of the shape representation, and to determine a gain correction based on the estimated accuracy measure. An envelope adjuster further included in the apparatus is configured to adjust the gain representation based on the determined gain correction.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.16/565,920 filed 10 Sep. 2019, which is a continuation of U.S.application Ser. No. 15/668,766 filed 4 Aug. 2017, now U.S. Pat. No.10,460,739, which is a continuation of U.S. application Ser. No.14/002,509 filed 30 Aug. 2013, now U.S. Pat. No. 10,121,481, which is aU.S. National Phase Application of PCT/SE2011/050899 filed 4 Jul. 2011,which claims benefit of U.S. Provisional Application No. 61/449,230filed 4 Mar. 2011. The entire contents of each aforementionedapplication is incorporated herein by reference.

TECHNICAL FIELD

The present technology relates to gain correction in audio coding basedon quantization schemes where the quantization is divided into a gainrepresentation and a shape representation, so called gain-shape audiocoding, and especially to post-quantization gain correction.

BACKGROUND

Modern telecommunication services are expected to handle many differenttypes of audio signals. While the main audio content is speech signals,there is a desire to handle more general signals such as music andmixtures of music and speech. Although the capacity in telecommunicationnetworks is continuously increasing, it is still of great interest tolimit the required bandwidth per communication channel. In mobilenetworks, smaller transmission bandwidths for each call yields lowerpower consumption in both the mobile device and the base station. Thistranslates to energy and cost saving for the mobile operator, while theend user will experience prolonged battery life and increased talk-time.Further, with less consumed bandwidth per user, the mobile network canservice a larger number of users in parallel.

Today, the dominating compression technology for mobile voice servicesis CELP (Code Excited Linear Prediction), which achieves good audioquality for speech at low bandwidths. It is widely used in deployedcodecs such as AMR (Adaptive MultiRate), AMR-WB (Adaptive MultiRateWideBand) and GSM-EFR (Global System for Mobile communications—EnhancedFullRate). However, for general audio signals such as music the CELPtechnology has poor performance. These signals can often be betterrepresented by using frequency transform based coding, for example, theITU-T codecs G.722.1 [1] and G.719 [2]. However, transform domain codecsgenerally operate at a higher bitrate than the speech codecs. There is agap between the speech and general audio domains in terms of coding, andit is desirable to increase the performance of transform domain codecsat lower bitrates.

Transform domain codecs require a compact representation of thefrequency domain transform coefficients. These representations oftenrely on vector quantization (VQ), where the coefficients are encoded ingroups. Among the various methods for vector quantization is thegain-shape VQ. This approach applies normalization to the vectors beforeencoding the individual coefficients. The normalization factor and thenormalized coefficients are referred to as the gain and the shape of thevector, which may be encoded separately. The gain-shape structure hasmany benefits. By dividing the gain and the shape, the codec can easilybe adapted to varying source input levels by designing the gainquantizer. It is also beneficial from a perceptual perspective where thegain and shape may carry different importance in different frequencyregions. Finally, the gain-shape division simplifies the quantizerdesign and makes it less complex in terms of memory and computationalresources compared to an unconstrained vector quantizer. A functionaloverview of a gain-shape quantizer can be seen in FIG. 1.

If applied to a frequency domain spectrum, the gain-shape structure canbe used to form a spectral envelope and fine structure representation.The sequence of gain values forms the envelope of the spectrum while theshape vectors give the spectral detail. From a perceptual perspective,it is beneficial to partition the spectrum using a non-uniform bandstructure which follows the frequency resolution of the human auditorysystem. This generally means that narrow bandwidths are used for lowfrequencies while larger bandwidths are used for high frequencies. Theperceptual importance of the spectral fine structure varies with thefrequency but is also dependent on the characteristics of the signalitself. Transform coders often employ an auditory model to determine theimportant parts of the fine structure and assign the available resourcesto the most important parts. The spectral envelope is often used asinput to this auditory model. The shape encoder quantizes the shapevectors using the assigned bits. See FIG. 2 for an example of atransform based coding system with an auditory model.

Depending on the accuracy of the shape quantizer, the gain value used toreconstruct the vector may be more or less appropriate. Especially whenthe allocated bits are few, the gain value drifts away from the optimalvalue. One way to solve this is to encode a correcting factor whichaccounts for the gain mismatch after the shape quantization. Anothersolution is to encode the shape first and then compute the optimal gainfactor given the quantized shape.

The solution to encode a gain correction factor after shape quantizationmay consume considerable bitrate. If the rate is already low, this meansmore bits have to be taken elsewhere and may perhaps reduce theavailable bitrate for the fine structure.

To encode the shape before encoding the gain is a better solution, butif the bitrate for the shape quantizer is decided from the quantizedgain value, then the gain and shape quantization would depend on eachother. An iterative solution could likely solve this co-dependency, butit could easily become too complex to run in real-time on a mobiledevice.

SUMMARY

An object is to obtain a gain adjustment in decoding of audio that hasbeen encoded with separate gain and shape representations.

This object is achieved in accordance with the attached claims.

A first aspect involves a gain adjustment method that includes thefollowing steps:

-   -   An accuracy measure of the shape representation is estimated.    -   A gain correction is determined based on the estimated accuracy        measure.    -   The gain representation is adjusted based on the determined gain        correction.

A second aspect involves a gain adjustment apparatus that includes:

-   -   An accuracy meter configured to estimate an accuracy measure of        the shape representation and to determine a gain correction        based on the estimated accuracy measure.    -   An envelope adjuster configured to adjust the gain        representation based on the determined gain correction.

A third aspect involves a decoder including a gain adjustment apparatusin accordance with the second aspect.

A fourth aspect involves a network node including a decoder inaccordance with the third aspect.

The proposed scheme for gain correction improves the perceived qualityof a gain-shape audio coding system. The scheme has low computationalcomplexity and does require few additional bits if any.

BRIEF DESCRIPTION OF THE DRAWINGS

The present technology, together with further objects and advantagesthereof, may best be understood by making reference to the followingdescription taken together with the accompanying drawings, in which:

FIG. 1 illustrates an example gain-shape vector quantization scheme;

FIG. 2 illustrates an example transform domain coding and decodingscheme;

FIG. 3A-C illustrates gain-shape vector quantization in a simplifiedcase;

FIG. 4 illustrates an example transform domain decoder using an accuracymeasure to determine an envelope correction;

FIG. 5A-B illustrates an example result of scaling the synthesis withgain factors when the shape vector is a sparse pulse vector,

FIG. 6A-B illustrates how the largest pulse height can indicate theaccuracy of the shape vector;

FIG. 7 illustrates an example of a rate based attenuation function forembodiment 1;

FIG. 8 illustrates an example of a rate and maximum pulse heightdependent gain adjustment function for embodiment 1;

FIG. 9 illustrates another example of a rate and maximum pulse heightdependent gain adjustment function for embodiment 1;

FIG. 10 illustrates an embodiment of the present technology in thecontext of an MDCT based audio coder and decoder system;

FIG. 11 illustrates an example of a mapping function from the stabilitymeasure to the gain adjustment limitation factor;

FIG. 12 illustrates an example of an ADPCM encoder and decoder systemwith an adaptive step size;

FIG. 13 illustrates an example in the context of a subband ADPCM basedaudio coder and decoder system;

FIG. 14 illustrates an embodiment of the present technology in thecontext of a subband ADPCM based audio coder and decoder system;

FIG. 15 illustrates an example transform domain encoder including asignal classifier;

FIG. 16 illustrates another example transform domain decoder using anaccuracy measure to determine an envelope correction;

FIG. 17 illustrates an embodiment of a gain adjustment apparatus inaccordance with the present technology;

FIG. 18 illustrates an embodiment of gain adjustment in accordance withthe present technology in more detail;

FIG. 19 is a flow chart illustrating the method in accordance with thepresent technology;

FIG. 20 is a flow chart illustrating an embodiment of the method inaccordance with the present technology; and

FIG. 21 illustrates an embodiment of a network in accordance with thepresent technology.

DETAILED DESCRIPTION

In the following description the same reference designations will beused for elements performing the same or similar function.

Before the present technology is described in detail, gain-shape codingwill be illustrated with reference to FIG. 1-3.

FIG. 1 illustrates an example gain-shape vector quantization scheme. Theupper part of the figure illustrates the encoder side. An input vector xis forwarded to a norm calculator 10, which determines the vector norm(gain) g, typically the Euclidian norm. This exact norm is quantized ina norm quantizer 12, and the inverse 1/ĝ of the quantized norm ĝ isforwarded to a multiplier 14 for scaling the input vector x into ashape. The shape is quantized in a shape quantizer 16. Representationsof the quantized gain and shape are forwarded to a bitstream multiplexer(mux) 18. These representations are illustrated by dashed lines toindicate that they may, for example, constitute indices into tables(code books) rather than the actual quantized values.

The lower part of FIG. 1 illustrates the decoder side. A bitstreamdemultiplexer (demux) 20 receives the gain and shape representations.The shape representation is forwarded to a shape dequantizer 22, and thegain representation is forwarded to a gain dequantizer 24. The obtainedgain ĝ is forwarded to a multiplier 26, where it scales the obtainedshape, which gives the reconstructed vector {tilde over (x)}.

FIG. 2 illustrates an example transform domain coding and decodingscheme. The upper part of the figure illustrates the encoder side. Aninput signal is forwarded to a frequency transformer 30, for example,based on the Modified Discrete Cosine Transform (MDCT), to produce thefrequency transform X. The frequency transform X is forwarded to anenvelope calculator 32, which determines the energy E(b) of eachfrequency band b. These energies are quantized into energies Ê(b) in anenvelope quantizer 34. The quantized energies Ê(b) are forwarded to anenvelope normalizer 36, which scales the coefficients of frequency bandb of the transform X with the inverse of the corresponding quantizedenergy Ê(b) of the envelope. The resulting scaled shapes are forwardedto a fine structure quantizer 38. The quantized energies Ê(b) are alsoforwarded to a bit allocator 40, which allocates bits for fine structurequantization to each frequency band b. As noted above, the bitallocation R(b) may be based on a model of the human auditory system.Representations of the quantized gains Ê(b) and corresponding quantizedshapes are forwarded to bitstream multiplexer 18.

The lower part of FIG. 2 illustrates the decoder side. The bitstreamdemultiplexer 20 receives the gain and shape representations. The gainrepresentations are forwarded to an envelope dequantizer 42. Thegenerated envelope energies Ê(b) are forwarded to a bit allocator 44,which determines the bit allocation R(b) of the received shapes. Theshape representations are forwarded to a fine structure dequantizer 46,which is controlled by the bit allocation R(b). The decoded shapes areforwarded to an envelope shaper 48, which scales them with thecorresponding envelope energies Ê(b) to form a reconstructed frequencytransform. This transform is forwarded to an inverse frequencytransformer 50, for example, based on the Inverse Modified DiscreteCosine Transform (IMDCT), which produces an output signal representingsynthesized audio.

FIG. 3A-C illustrates gain-shape vector quantization described above ina simplified case where the frequency band b is represented by the2-dimensional vector X(b) in FIG. 3A. This case is simple enough to beillustrated in a drawing, but also general enough to illustrate theproblem with gain-shape quantization (in practice the vectors typicallyhave 8 or more dimensions). The right hand side of FIG. 3A illustratesan exact gain-shape representation of the vector X(b) with a gain E(b)and a shape (unit length vector) N′(b).

However, as illustrated in FIG. 3B, the exact gain E(b) is encoded intoa quantized gain Ê(b) on the encoder side. Since the inverse of thequantized gain Ê(b) is used for scaling of the vector X(b), theresulting scaled vector N(b) will point in the correct direction, butwill not necessarily be of unit length. During shape quantization, thescaled vector N(b) is quantized into the quantized shape {circumflexover (N)}(b). In this case, the quantization is based on a pulse codingscheme [3], which constructs the shape (or direction) from a sum ofsigned integer pulses. The pulses may be added on top of each other foreach dimension. This means that the allowed shape quantization positionsare represented by the large dots in the rectangular grids illustratedin FIG. 3B-C. The result is that the quantized shape {circumflex over(N)}(b) will in general not coincide with the shape (direction) of N(b)(and N′(b)).

FIG. 3C illustrates that the accuracy of the shape quantization dependson the allocated bits R(b), or equivalently the total number of pulsesavailable for shape quantization. In the left part of FIG. 3C the shapequantization is based on 8 pulses, whereas the shape quantization in theright part uses only 3 pulses (the example in FIG. 3B uses 4 pulses).

Thus, it is appreciated that depending on the accuracy of the shapequantizer, the gain value Ê(b) used to reconstruct the vector X(b) onthe decoder side may be more or less appropriate. In accordance with thepresent technology, a gain correction can be based on an accuracymeasure of the quantized shape.

The accuracy measure used to correct the gain may be derived fromparameters already available in the decoder, but it may also depend onadditional parameters designated for the accuracy measure. Typically,the parameters would include the number of allocated bits for the shapevector and the shape vector itself, but it may also include the gainvalue associated with the shape vector and pre-stored statistics aboutthe signals that are typical for the encoding and decoding system. Anoverview of a system incorporating an accuracy measure and gaincorrection or adjustment is shown in FIG. 4.

FIG. 4 illustrates an example transform domain decoder 300 using anaccuracy measure to determine an envelope correction. In order to avoidcluttering of the drawing, only the decoder side is illustrated. Theencoder side may be implemented as in FIG. 2. The new feature is a gainadjustment apparatus 60. The gain adjustment apparatus 60 includes anaccuracy meter 62 configured to estimate an accuracy measure A(b) of theshape representation {circumflex over (N)}(b), and to determine a gaincorrection g_(C)(b) based on the estimated accuracy measure A(b). Italso includes an envelope adjuster 64 configured to adjust the gainrepresentation Ê(b) based on the determined gain correction.

As indicated above, the gain correction may in some embodiments beperformed without spending additional bits. This is done by estimatingthe gain correction from parameters already available in the decoder.This process can be described as an estimation of the accuracy of theencoded shape. Typically, this estimation includes deriving the accuracymeasure A(b) from shape quantization characteristics indicating theresolution of the shape quantization.

Embodiment 1

In one embodiment, the present technology is used in an audioencoder/decoder system. The system is transform based and the transformused is the Modified Discrete Cosine Transform (MDCT) using sinusoidalwindows with 50% overlap. However, it is understood that any transformsuitable for transform coding may be used together with appropriatesegmentation and windowing.

Encoder of Embodiment 1

The input audio is extracted into frames using 50% overlap and windowedwith a symmetric sinusoidal window. Each windowed frame is thentransformed to an MDCT spectrum X. The spectrum is partitioned intosubbands for processing, where the subband widths are non-uniform. Thespectral coefficients of frame m belonging to band b are denoted X(b,m)and have the bandwidth BW(b). Since most encoder and decoder steps canbe described within one frame, we omit the frame index and just use thenotation X(b). The bandwidths should preferably increase with increasingfrequency to comply with the frequency resolution of the human auditorysystem. The root-mean-square (RMS) value of each band is used as anormalization factor and is denoted E(b):

$\begin{matrix}{{E(b)} = \sqrt{\frac{{X(b)}^{T}{X(b)}}{{BW}(b)}}} & (1)\end{matrix}$

where X(b)^(T) denotes the transpose of X(b).

The RMS value can be seen as the energy value per coefficient. Thesequence of normalization factors E(b) for b=1, 2, . . . , N_(bands)forms the envelope of the MDCT spectrum, where N_(bands) denotes thenumber of bands. Next, the sequence is quantized in order to betransmitted to the decoder. To ensure that the normalization can bereversed in the decoder, the quantized envelope Ê(b) is obtained. Inthis example embodiment the envelope coefficients are scalar quantizedin log domain using a step size of 3 dB and the quantizer indices aredifferentially encoded using Huffman coding. The quantized envelope isused for normalization of the spectral bands, i.e.:

$\begin{matrix}{{N(b)} = {\frac{1}{\hat{E}(b)}{X(b)}}} & (2)\end{matrix}$

Note that if the non-quantized envelope E(b) is used for normalization,the shape would have RMS=1, i.e.:

$\begin{matrix}{{N^{\prime}(b)} = {{\frac{1}{E(b)} {X(b)}\ \Longrightarrow\sqrt{\frac{{N^{\prime}(b)}^{T}{N^{\prime}(b)}}{B{W(b)}}} } = 1}} & (3)\end{matrix}$

By using the quantized envelope Ê(b), the shape vector will have an RMSvalue close to 1. This feature will be used in the decoder to create anapproximation of the gain value.

The union of the normalized shape vectors N(b) forms the fine structureof the MDCT spectrum. The quantized envelope is used to produce a bitallocation R(b) for encoding of the normalized shape vectors N(b). Thebit allocation algorithm preferably uses an auditory model to distributethe bits to the perceptually most relevant parts. Any quantizer schememay be used for encoding the shape vector. Common for all is that theymay be designed under the assumption that the input is normalized, whichsimplifies quantizer design. In this embodiment the shape quantizationis done using a pulse coding scheme which constructs the synthesis shapefrom a sum of signed integer pulses [3]. The pulses may be added on topof each other to form pulses of different height. In this embodiment thebit allocation R(b) denotes the number of pulses assigned to band b.

The quantizer indices from the envelope quantization and shapequantization are multiplexed into a bitstream to be stored ortransmitted to a decoder.

Decoder of Embodiment 1

The decoder demultiplexes the indices from the bitstream and forwardsthe relevant indices to each decoding module. First, the quantizedenvelope Ê(b) is obtained. Next, the fine structure bit allocation isderived from the quantized envelope using a bit allocation identical theone used in the encoder. The shape vectors {circumflex over (N)}(b) ofthe fine structure are decoded using the indices and the obtained bitallocation R(b).

Now, before scaling the decoded fine structure with the envelope,additional gain correction factors are determined. First, the RMSmatching gain is obtained as:

$\begin{matrix}{{g_{RMS}(b)} = \sqrt{\frac{{BW}(b)}{{\hat{N}(b)}^{T}{\hat{N}(b)}}}} & (4)\end{matrix}$

The g_(RMS)(b) factor is a scaling factor that normalizes the RMS valueto 1, i.e.:

$\begin{matrix}{\sqrt{\frac{ {{g_{RMS}(b)}{\hat{N}(b)}} )^{T}( {{g_{RMS}(b)}{\hat{N}(b)}} )}{{BW}(b)}} = 1} & (5)\end{matrix}$

In this embodiment we seek to minimize the mean squared error (MSE) ofthe synthesis:

$\begin{matrix}{{g_{MSE}(b)} = {\underset{g}{\arg\mspace{14mu}\min}{{{N(b)} - {g \cdot {\hat{N}(b)}}}}}} & (6)\end{matrix}$

with the solution

$\begin{matrix}{{g_{MSE}(b)} = \frac{{\hat{N}(b)}^{T}{N(b)}}{{N(b)}^{T}{N(b)}}} & (7)\end{matrix}$

Since g_(MSE)(b) depends on the input shape N(b), it is not known in thedecoder. In this embodiment the impact is estimated by using an accuracymeasure. The ratio of these gains is defined as a gain correction factorg_(c)(b):

$\begin{matrix}{{g_{c}(b)} = \frac{g_{MSE}(b)}{g_{RMS}(b)}} & (8)\end{matrix}$

When the accuracy of the shape quantization is good, the correctionfactor is close to 1. i.e.:

{circumflex over (N)}(b)→N(b)⇒g _(c)(b)→1  (9)

However, when the accuracy of {circumflex over (N)}(b) is low,g_(MSE)(b) and g_(RMS)(b) will diverge. In this embodiment, where theshape is encoded using a pulse coding scheme, a low rate will make theshape vector sparse and g_(RMS)(b) will give an overestimate of theappropriate gain in terms of MSE. For this case g_(c)(b) should be lowerthan 1 to compensate for the overshoot. See FIG. 5A-B for an exampleillustration of the low rate pulse shape case. FIG. 5A-B illustrates anexample of scaling the synthesis with g_(MSE)(FIG. 5B) and g_(RMS)(FIG.5A) gain factors when the shape vector is a sparse pulse vector. Theg_(RMS) scaling gives pulses that are too high in an MSE sense.

On the other hand, a peaky or sparse target signal can be wellrepresented with a pulse shape. While the sparseness of the input signalmay not be known in the synthesis stage, the sparseness of the synthesisshape may serve as an indicator of the accuracy of the synthesized shapevector. One way to measure the sparseness of the synthesis shape is theheight of the maximum peak in the shape. The reasoning behind this isthat a sparse input signal is more likely to generate high peaks in thesynthesis shape. See FIGS. 6A-B for an illustration of how the peakheight can indicate the accuracy of two equal rate pulse vectors.

In FIG. 6A there are 5 pulses available (R(b)=5) to represent the dashedshape. Since the shape is rather constant, the coding generated 5distributed pulses of equal height 1. i.e. p_(max)=1. In FIG. 6B thereare also 5 pulses available to represent the dashed shape. However, inthis case the shape is peaky or sparse, and the largest peak isrepresented by 3 pulses on top of each other. i.e. p_(max)=3. Thisindicates that the gain correction g_(C)(b) depends on an estimatedsparseness p_(max) of the quantized shape.

As noted above, the input shape N(b) is not known by the decoder. Sinceg_(MSE)(b) depends on the input shape N(b), this means that the gaincorrection or compensation g_(c)(b) can in practice not be based on theideal equation (8). In this embodiment the gain correction g_(c)(b) isinstead decided based on the bit-rate in terms of the number of pulsesR(b), the height of the largest pulse in the shape vector p_(max)(b) andthe frequency band b, i.e.:

g _(c)(b)=f(R(b),p _(max)(b),b)  (10)

It has been observed that the lower rates generally require anattenuation of the gain to minimize the MSE. The rate dependency may beimplemented as a lookup table t(R(b)) which is trained on relevant audiosignal data. An example lookup table can be seen in FIG. 7. Since theshape vectors in this embodiment have different widths, the rate maypreferably be expressed as number of pulses per sample. In this way thesame rate dependent attenuation can be used for all bandwidths. Analternative solution, which is used in this embodiment, is to use a stepsize T in the table depending on the width of the band. Here, we use 4different bandwidths in 4 different groups and hence require 4 stepsizes. An example of step sizes is found in Table 1. Using the stepsize, the lookup value is obtained by using a roundingoperationt(└R(b)·T┘), where └ ┘ represents rounding to the closestinteger.

TABLE 1 Band group Bandwidth Step size T 1 8 4 2 16 4/3 3 24 2 4 34 1Another example lookup table is given in Table 2.

TABLE 2 Band group Bandwidth Step size T 1 8 4 2 16 4/3 3 24 2 4 32 1

The estimated sparseness can be implemented as another lookup tableu(R(b), p_(max)(b)) based on both the number of pulses R(b) and theheight of the maximum pulse p_(max)(b). An example lookup table is shownin FIG. 8. The lookup table u serves as an accuracy measure A(b) forband b, i.e.:

A(b)=u(R(b),p _(max)(b))  (11)

It was noted that the approximation of g_(MSE) was more suitable for thelower frequency range from a perceptual perspective. For the higherfrequencies the fine structure becomes less perceptually important andthe matching of the energy or RMS value becomes vital. For this reason,the gain attenuation may be applied only below a certain band numberb_(THR). In this case the gain correction g_(c)(b) will have an explicitdependence on the frequency band b. The resulting gain correctionfunction can in this case be defined as:

$\begin{matrix}{{g_{c}(b)} = \{ \begin{matrix}{{{t( {R(b)} )} \cdot {A(b)}},} & {b < b_{THR}} \\{1,} & {otherwise}\end{matrix} } & (12)\end{matrix}$

The description up to this point may also be used to describe theessential features of the example embodiment of FIG. 4. Thus, in theembodiment of FIG. 4, the final synthesis {circumflex over (X)}(b) iscalculated as:

$\begin{matrix}{{\hat{X}(b)} = {\underset{\underset{\overset{\sim}{E}{(n)}}{︸}}{{g_{c}(b)}{g_{RMS}(b)}{\hat{E}(n)}}{\hat{N}(b)}}} & (13)\end{matrix}$

As an alternative the function u(R(b), p_(max)(b)) may be implemented asa linear function of the maximum pulse height p_(max) and the allocatedbit rate R(b), for example as:

u(R(b),p _(max)(b))=k·(p _(max)(b)−R(b))+1  (14)

where the inclination k is determined by:

$\begin{matrix}{{k = \frac{1 - ( {a_{\min} + {{R(b)} \cdot {\Delta a}}} )}{{R(b)} - 1}}{{\Delta a} = {{( {a_{\max} - a_{\min}} )/R}(b)}}{a_{\max} = {1 - \frac{1 - a_{\min}}{{R(b)} - 1}}}} & (15)\end{matrix}$

The function depends on the tuning parameter a_(min) which gives theinitial attenuation factor for R(b)=1 and p_(max)(b)=1. The function isillustrated in FIG. 9, with the tuning parameter a_(min)=0.41. Typicallyu_(max)∈[0.7,1.4] and u_(min)∈[0, u_(max)]. In equation (14) u is linearin the difference between p_(max)(b) and R(b). Another possibility is tohave different inclination factors for p_(max)(b) and R(b).

The bitrate for a given band may change drastically for a given bandbetween adjacent frames. This may lead to fast variations of the gaincorrection. Such variations are especially critical when the envelope isfairly stable, i.e. the total changes between frames are quite small.This often happens for music signals which typically have more stableenergy envelopes. To avoid that the gain attenuation introducesinstability, an additional adaptation may be added. An overview of suchan embodiment is given in FIG. 10, in which a stability meter 66 hasbeen added to the gain adjustment apparatus 60 in the decoder 300.

The adaptation can, for example, be based on a stability measure of theenvelope Ê(b). An example of such a measure is to compute the squaredEuclidian distance between adjacent log₂ envelope vectors:

$\begin{matrix}{{{\Delta E}(m)} = {\frac{1}{N_{bands}}{\sum\limits_{b = 0}^{N_{bands} - 1}\;( {{\log_{2}{\hat{E}( {b,m} )}} - {\log_{2}{\hat{E}( {b,{m - 1}} )}}} )^{2}}}} & (16)\end{matrix}$

Here, ΔE(m) denotes the squared Euclidian distance between the envelopevectors for frame in and frame m−1. The stability measure may also belowpass filtered to have a smoother adaptation:

Δ{tilde over (E)}(m)=αΔE(m)+(1−α)ΔE(m−1)  (17)

A suitable value for the forgetting factor α may be 0.1. The smoothenedstability measure may then be used to create a limitation of theattenuation using, for example, a sigmoid function such as:

$\begin{matrix}{{g_{\min} = \frac{1}{1 + e^{{C_{1}{({{\Delta{\overset{\sim}{E}{(m)}}} - C_{2}})}} - C_{3}}}},} & (18)\end{matrix}$

where the parameters may be set to C₁=6, C₂=2 and C₃=1.9. It should benoted that these parameters are to be seen as examples, while the actualvalues may be chosen with more freedom. For instance:

-   -   C₁∈[1, 10]    -   C₂∈[1,4]    -   C₃∈[−5, 10]

FIG. 11 illustrates an example of a mapping function from the stabilitymeasure Δ{tilde over (E)}(m) to the gain adjustment limitation factorg_(min). The above expression for g_(min) is preferably implemented as alookup table or with a simple step function, such as:

$\begin{matrix}{g_{\min} = \{ \begin{matrix}{1,} & {{\Delta{\overset{\sim}{E}(m)}} < {{C_{3}/C_{1}} + C_{2}}} \\{0,} & {{\Delta{\overset{\sim}{E}(m)}} \geq {{C_{3}/C_{1}} + C_{2}}}\end{matrix} } & (19)\end{matrix}$

The attenuation limitation variable g_(min)ϵ[0,1] may be used to createa stability adapted gain modification {tilde over (g)}_(c)(b) as:

{tilde over (g)} _(c)(b)=max(g _(c)(b),g _(min))  (20)

After the estimation of the gain, the final synthesis {circumflex over(X)}(b) is calculated as:

$\begin{matrix}{{\hat{X}(b)} = {\underset{\underset{\overset{\sim}{E}{(n)}}{︸}}{{{\overset{\sim}{g}}_{c}(b)}{g_{RMS}(b)}{\hat{E}(n)}}{\hat{N}(b)}}} & (21)\end{matrix}$

In the described variations of embodiment 1 the union of the synthesizedvectors {circumflex over (X)}(b) forms the synthesized spectrum{circumflex over (X)}, which is further processed using the inverse MDCTtransform, windowed with the symmetric sine window and added to theoutput synthesis using the overlap-and-add strategy.

Embodiment 2

In another example embodiment, the shape is quantized using a QMF(Quadrature Mirror Filter) filter bank and an ADPCM (AdaptiveDifferential Pulse-Code Modulation) scheme for shape quantization. Anexample of a subband ADPCM scheme is the ITU-T G.722 [4]. The inputaudio signal is preferably processed in segments. An example ADPCMscheme is shown in FIG. 12, with an adaptive step size S. Here, theadaptive step size of the shape quantizer serves as an accuracy measurethat is already present in the decoder and does not require additionalsignaling. However, the quantization step size needs to be extractedfrom the parameters used by the decoding process and not from thesynthesized shape itself. An overview of this embodiment is shown inFIG. 14. However, before this embodiment is described in detail, anexample ADPCM scheme based on a QMF filter bank will be described withreference to FIGS. 12 and 13.

FIG. 12 illustrates an example of an ADPCM encoder and decoder systemwith an adaptive quantization step size. An ADPCM quantizer 70 includesan adder 72, which receives an input signal and subtracts an estimate ofthe previous input signal to form an error signal e. The error signal isquantized in a quantizer 74, the output of which is forwarded to thebitstream multiplexer 18, and also to a step size calculator 76 and adequantizer 78. The step size calculator 76 adapts the quantization stepsize S to obtain an acceptable error. The quantization step size S isforwarded to the bitstream multiplexer 18, and also controls thequantizer 74 and the dequantizer 78. The dequantizer 78 outputs an errorestimate ê to an adder 80. The other input of the adder 80 receives anestimate of the input signal which has been delayed by a delay element82. This forms a current estimate of the input signal, which isforwarded to the delay element 82. The delayed signal is also forwardedto the step size calculator 76 and to (with a sign change) the adder 72to form the error signal e.

An ADPCM dequantizer 90 includes a step size decoder 92, which decodesthe received quantization step size S and forwards it to a dequantizer94. The dequantizer 94 decodes the error estimate ê, which is forwardedto an adder 98, the other input of which receives the output signal fromthe adder delayed by a delay element 96.

FIG. 13 illustrates an example in the context of a subband ADPCM basedaudio encoder and decoder system. The encoder side is similar to theencoder side of the embodiment of FIG. 2. The essential differences arethat the frequency transformer 30 has been replaced by a QMF (QuadratureMirror Filter) analysis filter bank 100, and that fine structurequantizer 38 has been replaced by an ADPCM quantizer, such as thequantizer 70 in FIG. 12. The decoder side is similar to the decoder sideof the embodiment of FIG. 2. The essential differences are that theinverse frequency transformer 50 has been replaced by a QMF synthesisfilter bank 102, and that fine structure dequantizer 46 has beenreplaced by an ADPCM dequantizer, such as the dequantizer 90 in FIG. 12.

FIG. 14 illustrates an embodiment of the present technology in thecontext of a subband ADPCM based audio coder and decoder system. Inorder to avoid cluttering of the drawing, only the decoder side 300 isillustrated. The encoder side may be implemented as in FIG. 13.

Encoder of Embodiment 2

The encoder applies the QMF filter bank to obtain the subband signals.The RMS values of each subband signal are calculated and the subbandsignals are normalized. The envelope E(b), subband bit allocation R(b)and normalized shape vectors N(b) are obtained as in embodiment 1. Eachnormalized subband is fed to the ADPCM quantizer. In this embodiment theADPCM operates in a forward adaptive fashion, and determines a scalingstep S(b) to be used for subband b. The scaling step is chosen tominimize the MSE across the subband frame. In this embodiment the stepis chosen by trying all possible steps and selecting the one which givesthe minimum MSE:

$\begin{matrix}{{S(b)} = {\min\limits_{s}{\frac{1}{B{W(b)}}( {{N(b)} - {Q( {{N(b)},\ s} )}} )^{T}( {{N(b)} - {Q( {{N(b)},\ s} )}} )}}} & (22)\end{matrix}$

where Q(x, s) is the ADPCM quantizing function of the variable x using astep size of s. The selected step size may be used to generate thequantized shape:

{circumflex over (N)}(b)=Q(N(b),S(b))  (23)

The quantizer indices from the envelope quantization and shapequantization are multiplexed into a bitstream to be stored ortransmitted to a decoder.

Decoder of Embodiment 2

The decoder demultiplexes the indices from the bitstream and forwardsthe relevant indices to each decoding module. The quantized envelopeÊ(b) and the bit allocation R(b) are obtained as in embodiment 1. Thesynthesized shape vectors {circumflex over (N)}(b) are obtained from theADPCM decoder or dequantizer together with the adaptive step sizes S(b).The step sizes indicate an accuracy of the quantized shape vector, wherea smaller step size corresponds to a higher accuracy and vice versa. Onepossible implementation is to make the accuracy A(b) inverselyproportional to the step size using a proportionality factor γ:

$\begin{matrix}{{A(b)} = {\gamma\frac{1}{S(b)}}} & (24)\end{matrix}$

where γ should be set to achieve the desired relation. One possiblechoice is γ=S_(min) where S_(min) is the minimum step size, which givesaccuracy 1 for S(b)=S_(min).

The gain correction factor g_(c) may be obtained using a mappingfunction:

g _(c)(b)=h(R(b),b)·A(b)  (25)

The mapping function h may be implemented as a lookup table based on therate R(b) and frequency band b. This table may be defined by clusteringthe optimal gain correction values g_(MSE)/g_(RMS) by these parametersand computing the table entry by averaging the optimal gain correctionvalues for each cluster.

After the estimation of the gain correction, the subband synthesis{circumflex over (X)}(b) is calculated as:

$\begin{matrix}{{\hat{X}(b)} = {\underset{\underset{\overset{\sim}{E}{(n)}}{︸}}{{g_{c}(b)}{g_{RMS}(b)}{\hat{E}(n)}}{\hat{N}(b)}}} & (26)\end{matrix}$

The output audio frame is obtained by applying the synthesis QMF filterbank to the subbands.

In the example embodiment illustrated in FIG. 14 the accuracy meter 62in the gain adjustment apparatus 60 receives the not yet decodedquantization step size S(b) directly from the received bitstream. Analternative, as noted above, is to decode it in the ADPCM dequantizer 90and forward it in decoded form to the accuracy meter 62.

Further Alternatives

The accuracy measure could be complemented with a signal class parameterderived in the encoder. This may for instance be a speech/musicdiscriminator or a background noise level estimator. An overview of asystem incorporating a signal classifier is shown in FIG. 15-16. Theencoder side in FIG. 15 is similar to the encoder side in FIG. 2, buthas been provided with a signal classifier 104. The decoder side 300 inFIG. 16 is similar to the decoder side in FIG. 4, but has been providedwith a further signal class input to the accuracy meter 62.

The signal class could be incorporated in the gain correction forinstance by having a class dependent adaptation. If we assume the signalclasses are speech or music corresponding to the values C=1 and C=0respectively, we can constrain the gain adjustment to be effective onlyduring speech, i.e.:

$\begin{matrix}{{g_{c}(b)} = \{ \begin{matrix}{{{t( {R(b)} )} \cdot {A(b)}},} & {{b < {b_{THR}\bigwedge C}} = 1} \\{1,} & {otherwise}\end{matrix} } & (27)\end{matrix}$

In another alternative embodiment the system can act as a predictortogether with a partially coded gain correction or compensation. In thisembodiment the accuracy measure is used to improve the prediction of thegain correction or compensation such that the remaining gain error maybe coded with fewer bits.

When creating the gain correction or compensation factor g_(c) one mightwant to do a trade-off between matching the RMS value or energy andminimizing the MSE. In some cases matching the energy becomes moreimportant than an accurate waveform. This is for instance true forhigher frequencies. To accommodate this, the final gain correction may,in a further embodiment, be formed by using a weighted sum of thedifferent gain values:

$\begin{matrix}{g_{c}^{\prime} = {\frac{{\beta g}_{RMS} + {( {1 - \beta} )g_{MSE}}}{g_{RMS}} = {{\beta + {( {1 - \beta} )\frac{g_{MSE}}{g_{RMS}}}} = {\beta + {( {1 - \beta} )g_{c}}}}}} & (28)\end{matrix}$

where g_(c) is the gain correction obtained in accordance with one ofthe approaches described above. The weighting factor β can be madeadaptive to e.g. the frequency, bitrate or signal type.

The steps, functions, procedures and/or blocks described herein may beimplemented in hardware using any conventional technology, such asdiscrete circuit or integrated circuit technology, including bothgeneral-purpose electronic circuitry and application-specific circuitry.

Alternatively, at least some of the steps, functions, procedures and/orblocks described herein may be implemented in software for execution bya suitable processing device, such as a micro processor, Digital SignalProcessor (DSP) and/or any suitable programmable logic device, such as aField Programmable Gate Array (FPGA) device.

It should also be understood that it may be possible to reuse thegeneral processing capabilities of the decoder. This may, for example,be done by reprogramming of the existing software or by adding newsoftware components.

FIG. 17 illustrates an embodiment of a gain adjustment apparatus 60 inaccordance with the present technology. This embodiment is based on aprocessor 110, for example a micro processor, which executes a softwarecomponent 120 for estimating the accuracy measure, a software component130 for determining gain the correction, and a soft-ware component 140for adjusting the gain representation. These software components arestored in memory 150. The processor 110 communicates with the memoryover a system bus. The parameters {circumflex over (N)}(b), R(b), Ê(b)are received by an input/output (I/O) controller 160 controlling an I/Obus, to which the processor 110 and the memory 150 are connected. Inthis embodiment the parameters received by the I/O controller 160 arestored in the memory 150, where they are processed by the softwarecomponents. Software components 120, 130 may implement the functionalityof block 62 in the embodiments described above. Software component 140may implement the functionality of block 64 in the embodiments describedabove. The adjusted gain representation {tilde over (E)}(b) obtainedfrom software component 140 is outputted from the memory 150 by the I/Ocontroller 160 over the I/O bus.

FIG. 18 illustrates an embodiment of gain adjustment in accordance withthe present technology in more detail. An attenuation estimator 200 isconfigured to use the received bit allocation R(b) to determine a gainattenuation t(R(b)). The attenuation estimator 200 may, for example, beimplemented as a lookup table or in software based on a linear equationsuch as equation (14) above. The bit allocation R(b) is also forwardedto a shape accuracy estimator 202, which also receives an estimatedsparseness p_(max)(b) of the quantized shape, for example represented bythe height of the highest pulse in the shape representation {circumflexover (N)}(b). The shape accuracy estimator 202 may, for example, beimplemented as a lookup table. The estimated attenuation t(R(b)) and theestimated shape accuracy A(b) are multiplied in a multiplier 204. In oneembodiment this product t(R(b))·A(b) directly forms the gain correctiong_(c)(b). In another embodiment the gain correction g_(c)(b) is formedin accordance with equation (12) above. This requires a switch 206controlled by a comparator 208, which determines whether the frequencyband b is less than a frequency limit b_(THR). If this is the case, theng_(c)(b) is equal to t(R(b))·A(b). Otherwise g_(c)(b) is set to 1. Thegain correction g_(c)(b) is forwarded to another multiplier 210, theother input of which receives the RMS matching gain g_(RMS)(b). The RMSmatching gain g_(RMS)(b) is determined by an RMS matching gaincalculator 212 based on the received shape representation {circumflexover (N)}(b) and corresponding bandwidth BW(b), see equation (4) above.The resulting product is forwarded to another multiplier 214, which alsoreceives the shape representation {circumflex over (N)}(b) and the gainrepresentation Ê(b), and forms the synthesis {circumflex over (X)}(b).

The stability detection described with reference to FIG. 10 may beincorporated into embodiment 2 as well as the other embodimentsdescribed above.

FIG. 19 is a flow chart illustrating the method in accordance with thepresent technology. Step S1 estimates an accuracy measure A(b) of theshape representation {circumflex over (N)}(b). The accuracy measure may,for example, be derived from shape quantization characteristics, such asR(b), S(b), indicating the resolution of the shape quantization. Step S2determines a gain correction, such as g_(C)(b), {tilde over (g)}_(C)(b),g′_(C)(b), based on the estimated accuracy measure. Step S3 adjusts thegain representation Ê(b) based on the determined gain correction.

FIG. 20 is a flow chart illustrating an embodiment of the method inaccordance with the present technology, in which the shape has beenencoded using a pulse coding scheme and the gain correction depends onan estimated sparseness p_(max)(b) of the quantized shape. It is assumedthat an accuracy measure has already been determined at step S1 (FIG.19). Step S4 estimates a gain attenuation that depends on allocated bitrate. Step S5 determines a gain correction based on the estimatedaccuracy measure and the estimated gain attenuation. Thereafter theprocedure proceeds to step S3 (FIG. 19) to adjust the gainrepresentation.

FIG. 21 illustrates an embodiment of a network in accordance with thepresent technology. It includes a decoder 300 provided with a gainadjustment apparatus in accordance with the present technology. Thisembodiment illustrates a radio terminal, but other network nodes arealso feasible. For example, if voice over IP (Internet Protocol) is usedin the network, the nodes may comprise computers.

In the network node in FIG. 21 an antenna 302 receives a coded audiosignal. A radio unit 304 transforms this signal into audio parameters,which are forwarded to the decoder 300 for generating a digital audiosignal, as described with reference to the various embodiments above.The digital audio signal is then D/A converted and amplified in a unit306 and finally forwarded to a loudspeaker 308.

Although the description above focuses on transform based audio coding,the same principles may also be applied to time domain audio coding withseparate gain and shape representations, for example CELP coding.

It will be understood by those skilled in the art that variousmodifications and changes may be made to the present technology withoutdeparture from the scope thereof, which is defined by the appendedclaims.

ABBREVIATIONS

ADPCM Adaptive Differential Pulse-Code Modulation

AMR Adaptive MultiRate

AMR-WB Adaptive MultiRate WideBand

CELP Code Excited Linear Prediction

GSM-EFR Global System for Mobile communications—Enhanced FullRate

DSP Digital Signal Processor

FPGA Field Programmable Gate Array

IP Internet Protocol

MDCT Modified Discrete Cosine Transform

MSE Mean Squared Error

QMF Quadrature Mirror Filter

RMS Root-Mean-Square

VQ Vector Quantization

REFERENCES

-   [1] “ITU-T G.722.1 ANNEX C: A NEW LOW-COMPLEXITY 14 KHZ AUDIO CODING    STANDARD”, ICASSP 2006.-   [2] “ITU-T G.719: A NEW LOW-COMPLEXITY FULL-BAND (20 KHZ) AUDIO    CODING STANDARD FOR HIGH-QUALITY CONVERSATIONAL APPLICATIONS”, WASPA    2009.-   [3] U. Mittal. J. Ashley, E. Cruz-Zeno, “Low Complexity Factorial    Pulse Coding of MDCT Coefficients using Approximation of    Combinatorial Functions,” ICASSP 2007.-   [4] “7 kHz Audio Coding Within 64 kbit/s”. [G.722], IEEE JOURNAL ON    SELECTED AREAS IN COMMUNICATIONS, 1988.

What is claimed is:
 1. A method of operation by a gain adjustmentapparatus, the method comprising: receiving an encoded audio signalcomprising a set of gain values and a corresponding set of shapevectors, each gain value representing the energy of a frequency sub-bandin a frequency transform of an input audio signal, and eachcorresponding shape vector representing a fine structure of thefrequency transform in the frequency sub-band; determining an accuracymeasure for each shape vector from corresponding shape quantizationcharacteristics indicating a quantization resolution; determining a gaincorrection for each gain value as a function of the accuracy measurecalculated for the corresponding shape vector; and adjusting each gainvalue according to the corresponding gain correction, to obtaincorrected gain values, for use in decoding the encoded audio signal. 2.The method of claim 1, wherein each shape vector comprises a pulsevector and wherein determining the accuracy measure for the shape vectorcomprises calculating the accuracy measure as a function of the numberof pulses allocated to the pulse vector, as said quantizationresolution, and a maximum pulse height for the pulse vector, and whereingreater pulse allocations correspond to higher accuracy and smallerpulse allocations correspond to lower accuracy.
 3. The method of claim2, further comprising determining the accuracy measure for each shapevector as a further function of the number of pulses allocated to thepulse vector in relation to a bandwidth of the frequency sub-bandcorresponding to the shape vector.
 4. The method of claim 1, whereindetermining the gain correction for each gain value comprises obtaininga gain correction factor from a stored table of gain correction factorsindexed as a function of accuracy measures, and wherein adjusting eachgain value according to the corresponding gain correction comprisesapplying the corresponding gain correction factor to each gain value. 5.The method of claim 1, wherein determining the accuracy measure for eachshape vector comprises obtaining the accuracy measure from a storedtable of accuracy measures indexed as a function of quantizationresolution.
 6. The method of claim 1, wherein determining the accuracymeasure for each shape vector comprises determining the accuracy measureas a linear function of an allocated bit rate used for shaperepresentation.
 7. The method of claim 1, wherein adjusting each gainvalue comprises scaling each gain value according to the correspondinggain correction, and wherein the scaling further depends on whether theencoded audio signal represents encoded speech or encoded music.
 8. Themethod of claim 1, wherein adjusting each gain value according to thecorresponding gain correction comprises storing, at least temporarily,each corrected gain value and the corresponding shape vector, fordecoding.
 9. A gain adjustment apparatus comprising: input circuitryconfigured to receive an encoded audio signal comprising a set of gainvalues and a corresponding set of shape vectors, each gain valuerepresenting the energy of a frequency sub-band in a frequency transformof an input audio signal, and each corresponding shape vectorrepresenting a fine structure of the frequency transform in thefrequency sub-band; and gain correction circuitry configured to:determine an accuracy measure for each shape vector from correspondingshape quantization characteristics indicating a quantization resolution;determine a gain correction for each gain value as a function of theaccuracy measure calculated for the corresponding shape vector; andadjust each gain value according to the corresponding gain correction,to obtain corrected gain values, for use in decoding the encoded audiosignal.
 10. The gain adjustment apparatus of claim 9, wherein each shapevector comprises a pulse vector and wherein the gain adjustmentapparatus is configured to determine the accuracy measure for the shapevector by calculating the accuracy measure as a function of the numberof pulses allocated to the pulse vector, as said quantizationresolution, and a maximum pulse height for the pulse vector, and whereingreater pulse allocations correspond to higher accuracy and smallerpulse allocations correspond to lower accuracy.
 11. The gain adjustmentapparatus of claim 10, wherein the gain adjustment apparatus isconfigured to determine the accuracy measure for each shape vector as afurther function of the number of pulses allocated to the pulse vectorin relation to a bandwidth of the frequency sub-band corresponding tothe shape vector.
 12. The gain adjustment apparatus of claim 9, whereinthe gain adjustment apparatus is configured to determine the gaincorrection for each gain value by obtaining a gain correction factorfrom a stored table of gain correction factors indexed as a function ofaccuracy measures, and to adjust each gain value according to thecorresponding gain correction by applying the corresponding gaincorrection factor to each gain value.
 13. The gain adjustment apparatusof claim 9, wherein the gain adjustment apparatus is configured todetermine the accuracy measure for each shape vector by obtaining theaccuracy measure from a stored table of accuracy measures indexed as afunction of quantization resolution.
 14. The gain adjustment apparatusof claim 9, wherein the gain adjustment apparatus is configured todetermine the accuracy measure for each shape vector by determining theaccuracy measure as a linear function of an allocated bit rate used forshape representation.
 15. The gain adjustment apparatus of claim 9,wherein the gain adjustment apparatus is configured to adjust each gainvalue by scaling each gain value according to the corresponding gaincorrection, and to make the scaling further depend on whether theencoded audio signal represents encoded speech or encoded music.
 16. Thegain adjustment apparatus of claim 9, wherein the gain adjustmentapparatus is configured to adjust each gain value according to thecorresponding gain correction by storing, at least temporarily, eachcorrected gain value and the corresponding shape vector, for decoding.